[VL] Adding configurations on max write file size by zhouyuan · Pull Request #11606 · apache/gluten

zhouyuan · 2026-02-11T17:32:14Z

What changes are proposed in this pull request?

Adding config for max write file size in Velox

How was this patch tested?

pass GHA
Velox UT

Was this patch authored or co-authored using generative AI tooling?

Signed-off-by: Yuan <yuanzhou@apache.org>

FelixYBW · 2026-02-12T04:13:25Z

backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala

      .createWithDefault(10000)

+  val MAX_TARGET_FILE_SIZE_SESSION =
+    buildConf("spark.gluten.sql.columnar.backend.velox.maxTargetFileSizeSession")


what does Session mean here?

FelixYBW · 2026-02-12T04:17:19Z

docs/velox-configuration.md

 | spark.gluten.sql.columnar.backend.velox.maxSpillFileSize                         | 1GB               | The maximum size of a single spill file created                                                                                                                                                                                                                                                                                                                                                                                                       |
 | spark.gluten.sql.columnar.backend.velox.maxSpillLevel                            | 4                 | The max allowed spilling level with zero being the initial spilling level                                                                                                                                                                                                                                                                                                                                                                             |
 | spark.gluten.sql.columnar.backend.velox.maxSpillRunRows                          | 3M                | The maximum row size of a single spill run                                                                                                                                                                                                                                                                                                                                                                                                            |
+| spark.gluten.sql.columnar.backend.velox.maxTargetFileSizeSession                 | 0b                | The target file size for each output file when writing data. 0 means no limit on target file size, and the actual file size will be determined by other factors such as max partition number and shuffle batch size.                                                                                                                                                                                                                                  |


Does it map to iceberg's write.target-file-size-bytes? and honor spark.sql.iceberg.advisory-partition-size? If so let's honor this config in Gluten as well.

If it only take effect on iceberg, we may just reuse iceberg's config instead of a new config.

As of today Velox is using kMaxTargetFileSize to control the parquet write file size, so it will impact on all parquet write. In current iceberg write code path, this is parameter is picked to control the parquet size in each partition.

In iceberg java the logic for partition control is:
https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java#L695-L702

.option(SparkWriteOptions.ADVISORY_PARTITION_SIZE) .sessionConf(SparkSQLProperties.ADVISORY_PARTITION_SIZE) .tableProperty(TableProperties.SPARK_WRITE_ADVISORY_PARTITION_SIZE_BYTES) .defaultValue(defaultValue)

write_options > session > table property

This config for Iceberg should get from SparkWrite as codec, https://github.com/apache/iceberg/blob/main/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L138

jinchengchenghh · 2026-03-02T14:53:53Z

backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala

      .createWithDefault(10000)

+  val MAX_TARGET_FILE_SIZE_SESSION =
+    buildConf("spark.gluten.sql.columnar.backend.velox.maxTargetFileSizeSession")


Please remove Session suffix, this is Velox code config type suffix, not the config itself

Signed-off-by: Yuan <yuanzhou@apache.org>

github-actions bot added VELOX DOCS labels Feb 11, 2026

[VL] Adding configurations on max write file size

1b8fe5d

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan force-pushed the wip_config_writer branch from babb80d to 1b8fe5d Compare February 11, 2026 17:57

FelixYBW reviewed Feb 12, 2026

View reviewed changes

jinchengchenghh reviewed Mar 2, 2026

View reviewed changes

use hive config

a84a31d

Signed-off-by: Yuan <yuanzhou@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Adding configurations on max write file size#11606

[VL] Adding configurations on max write file size#11606
zhouyuan wants to merge 2 commits intoapache:mainfrom
zhouyuan:wip_config_writer

zhouyuan commented Feb 11, 2026

Uh oh!

FelixYBW Feb 12, 2026

Uh oh!

FelixYBW Feb 12, 2026 •

edited

Loading

Uh oh!

zhouyuan Mar 13, 2026

Uh oh!

zhouyuan Mar 13, 2026

Uh oh!

jinchengchenghh Mar 13, 2026

Uh oh!

jinchengchenghh Mar 2, 2026

Uh oh!

zhouyuan Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhouyuan commented Feb 11, 2026

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

FelixYBW Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

FelixYBW Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhouyuan Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

zhouyuan Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

zhouyuan Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FelixYBW Feb 12, 2026 •

edited

Loading